# Relevant Data and Code Information for Pescosolido, Lee, and Kafadar, 2020, "Cross-level sociodemographic homogeneity alters individual risk for completed suicide." 

We offer here all R and Stata codes to reproduce all figures and tables in the paper, Pescosolido, Lee, and Kafadar, 2020, "Cross-level sociodemographic homogeneity alters individual risk for completed suicide.", PNAS, forthcoming.

If you have any problems or issues replicating our figures/tables, please contact Byungkyu Lee (bl11@indiana.edu).

You can run `META_multilevel_suicide.R` to replicate and reproduce all tables and figures.

# some remarks on data sources

1. NVDRS data is restricted and requires application to the CDC and an approved project-specific Data Use Agreement. Since we are not allowed to post and share suicide case data from NVDRS, we instead provide R codes to clean and generate the raw NVDRS data (`1_clean_NVDRS.R`). Note that this code may not work for later versions of NVDRS data. Please visit the following NVDRS website to get access to the data.
  * https://www.cdc.gov/violenceprevention/datasources/nvdrs/datapublications.html

2. The ACS PUMS data are available via the US Census Bureau’s ACS website (https://www.census.gov/programs-surveys/acs/data/pums.html), the FTP site (https://www2.census.gov/programs-surveys/acs/data/pums/), and data.census.gov. Since the raw data are too large to share, we have posted the processed and compressed ACS PUMS data (`processed/microACS_full_v1.fst`), which can be generated by running `1_clean_ACS_PUMS.R` code. 

3. The longitudinal RCMS data are available at the following website. We include this data in our dataverse under `data/rawdata/RCMS`.
  * http://www.thearda.com/Archive/Files/Descriptions/RCMSMGCY.asp

4. To create matching weights that assign PUMAs to counties in the ACS PUMS data, we downloaded two county-puma link files from Geocorr 2014 (http://mcdc.missouri.edu/applications/geocorr2014.html) website. We included both 2000 PUMA to county, and 2012 PUMA to county files in the `rawdata/georelated` folder.

5. Other county-level data sets (i.e., population count, in-and-out migration rate, land area size, poverty) are obtained from the following US census website. We have included the processed files under `rawdata/county` folder.
  * landarea : https://www.census.gov/library/publications/2011/compendia/usa-counties-2011.html
  * poverty : https://www.census.gov/data/developers/data-sets/Poverty-Statistics.html
  * migration rate : https://www.census.gov/topics/population/migration/guidance/county-to-county-migration-flows.html
  * population count : https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/ and https://www2.census.gov/programs-surveys/popest/datasets/2000-2010/intercensal/county/co-est00int-tot.csv

# Workflow to replicate results in the paper

If you specify your directory in `META_multilevel_suicide.R` file, then you should be able to reproduce all figures and tables in the paper. Specifically, 

## data cleaning 
0. `0_convert_files_to_fst.R` : convert all files to fst format (c.f., `fst` format is much faster to read/save in R than any other data format)
1. `1_clean_NVDRS.R` : clean NVDRS data
2. `2_clean_ACS_PUMS.R` : clean ACS PUMS data
3. `3_clean_RCMS.R` : clean RCMS data

## combine data 
4. `4_create_county_weight_for_ACS.R` : using GeoCorr 2014 matching files, create county-weights for living individuals in ACS PUMS data
  * weight for ACS control : `PerWgt` * `# in county i among J / # in PUMA j`
  * weight for NVDRS cases : 1
5. `5_merge_NVDRS_ACS_RCMS_county.R` : combine NVDRS + ACS PUMS + RCMS + other county measures

## imputation and create data table for regression analysis
6. `6_impute_reg_table.R` : use _mice_ package to run the missing imputation for each county i and year t 

7. `7_create_reg_table_main.R` : create regression tables for the main data set
7. `7_create_reg_table_imputed.R` : create regression tables for the imputed data set (M=10)
  + create county-level measures by aggregating individual data using weights 
  * here we also measure "sameness" by assigning the corresponding county-level measure for each category 

## run logistic regression models using Stata
8. `8_model_execute_command.R` : META file runs the bash script, `8_stata_run_bash_cb.sh`, on IU's slate space, which invokes `8_run_regression_model.do` file below.
8. `8_run_regression_model.do` : this file combines regression data tables (county-level measures + imputed data) created by R codes, and run the following six models using logit models with and without multiple imputations, and finally compute the marginal effects.
  * We run our model using this strategy because it takes a long time to run the marginal effects
  * output : `logit_1.smcl` to .... `mi_10.smcl` files, are included under `results/log` folder.
8. `8_table_regression_all_table.do` : this file produces all summary + regression tables with odds ratio in Appendix.
  * output : `logit_appendix_table.csv` and `mi_appendix_table.csv` for Table S4-S6 in Supporting Information
  * output : `table_appendix_0819.txt` includes information about all summary tables
  
## produce all figures for marginal effects from Stata log files 
9. `9_transform_from_smcl_to_txt.do` : this file transforms the smcl file to a text file, so that R can read this result.
9. `9_figure_pnas_all.R` : this file reads the stata log file and produce figure 1, figure 2, figure 3, figure S2 in SI
  * this file invokes `9_function_figure.R` file that includes custom functions to read stata log file + plot figures
  * output : all figure pdf files under `results/figure` folder.
9. `9_figure_category_distribution.R` this file produces Figure S1 in SI.

# Required Packages in Stata and R

## Stata 
To run the Stata Do file, you need to install the following packages

1. Reporting the marginal change
  + net from http://www.indiana.edu/~jslsoc/stata/
  + net install spost13_ado 

2. Reporting regression table 
  + ssc install estout, replace 

3. Estimating fixed effects models (fast)
  + ssc install reghdfe, replace

## R 
All the following packages should be installed in advance.

```
base_package = c(
'rio','haven','data.table','fst','readr', 'readxl' # for data i/o
'stringr','refinr', # for data cleaning : occupation fields
'mice','parallel', # run multiple imputations
'tidyverse','hrbrthemes','extrafont','viridis','ggpubr','ggsci' # for plotting figures
)
install.packages(base_package)
```
